Integrating multiple evidence sources to predict transcription factor binding in the human genome.

نویسندگان

  • Jason Ernst
  • Heather L Plasterer
  • Itamar Simon
  • Ziv Bar-Joseph
چکیده

Information about the binding preferences of many transcription factors is known and characterized by a sequence binding motif. However, determining regions of the genome in which a transcription factor binds based on its motif is a challenging problem, particularly in species with large genomes, since there are often many sequences containing matches to the motif but are not bound. Several rules based on sequence conservation or location, relative to a transcription start site, have been proposed to help differentiate true binding sites from random ones. Other evidence sources may also be informative for this task. We developed a method for integrating multiple evidence sources using logistic regression classifiers. Our method works in two steps. First, we infer a score quantifying the general binding preferences of transcription factor binding at all locations based on a large set of evidence features, without using any motif specific information. Then, we combined this general binding preference score with motif information for specific transcription factors to improve prediction of regions bound by the factor. Using cross-validation and new experimental data we show that, surprisingly, the general binding preference can be highly predictive of true locations of transcription factor binding even when no binding motif is used. When combined with motif information our method outperforms previous methods for predicting locations of true binding.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction

Background. Revealing protein-DNA interactions is a key problem in understanding transcriptional regulation at mechanistic level. Computational methods have an important role in predicting transcription factor target gene genomewide. Multiple data fusion provides a natural way to improve transcription factor target gene predictions because sequence specificities alone are not sufficient to accu...

متن کامل

Predicting Cell Cycle Genes from E-MAP Profiles by Integrating Multiple Types of Data

Interactions between genes and proteins can be revealed by multiple experimental platforms. The derived interaction networks can be utilized to discover novel genes involved in specific biological process. E-MAP is an experimental platform to measure genetic interactions in a genome-wide scale, which successfully recovered known pathways and also revealed novel protein complexes in S. cerevisia...

متن کامل

Integration of Genome and Chromatin Structure with Gene Expression Profiles To Predict c-MYC Recognition Site Binding and Function

The MYC genes encode nuclear sequence specific-binding DNA-binding proteins that are pleiotropic regulators of cellular function, and the c-MYC proto-oncogene is deregulated and/or mutated in most human cancers. Experimental studies of MYC binding to the genome are not fully consistent. While many c-MYC recognition sites can be identified in c-MYC responsive genes, other motif matches-even expe...

متن کامل

Transcription Factor Binding Sites Prediction Based on Modified Nucleosomes

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed addit...

متن کامل

Revealing the structure and dynamics of cis-regulation using heterogeneous, genome-wide, multi-species data.

Keywords: gene networks Gene regulation is based on interactions between transcription factors and their DNA binding sites. We report on three studies on the structure and dynamics of cis-regulation. (1) By studying the rate of changes of motifs in promoters of four yeast genomes, we provide a first global view of the selection forces acting in the evolution of binding sites. Our analysis [2] s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 20 4  شماره 

صفحات  -

تاریخ انتشار 2010